Software Vault: The Gold Collection

home *** CD-ROM | disk | FTP | other *** search

/ Software Vault: The Gold Collection / Software Vault - The Gold Collection (American Databankers) (1993).ISO / cdr48 / coun20.zip / COUN.DOC next >

Wrap

Text File | 1993-04-04 | 13KB | 295 lines

Turbo Pascal Record Compress Procedure Carl A Franz JFL Consulting "We will sell no software before it's written" 1115 S. Ridgeland Oak Park, Il. 60304 (708) 383-1546 CServe: 71041,1512 When UnZiping the COUN.ZIP file you should have received: 1) COUN.PAS - The Compress/Uncompress source. 2) TESTCOUN.PAS - Demonstration program for COUN. 3) COUN.DOC - This file. Quite frankly, this is only useful if you have a database like BTrieve or TBTree which allows you to have variable length records in a database. If you can't afford BTrieve (I can't), try TBTree written by a guy named Dean Farwell 73240,3335. I get nothing from Dean to plug his product, so my opinion of this product is untainted. It's great. You can put up a well designed database with Turbo Pascal and the TBTree product. It's much better then the Borland Database toolkit. I think it's about $25 now. A heck of a bang for your buck. Anyway, on to this product. The routines in COUN compress out space in your records by removing the extra space in the STRING variables. For instance, if you have a record for an address book like the following: AddrBook = Record NAME : String[40]; ADDR1 : String[40]; ADDR2 : String[40]; City : String[25]; St : String[2]; Zip : String[9]; You have allocated 162 bytes, however, rarely is all that space used for actual data. For instance, my name and address use all of 64 bytes. That's a lot of wasted space. Since I am, in fact, building an address book of sorts, and I am planning on keeping several record types in one file, I figured I needed to save some space. Thus I wrote these Compress/Uncompress routines for Turbo Pascal Records. How it works: There are 2 routines. FUNCTION Compress(CMap : STRING; VAR InData; VAR OutData) : INTEGER; This function accepts a map of your Pascal record (CMap), your record (InData), and someplace you want the compressed record information to go (OutData). I highly suggest that the field you use for OutData be a byte array as large as the Record you are compressing. The function then returns the length of the compressed record. PROCEDURE UnCompress(CMap : STRING; VAR InData; VAR OutData); This procedure accepts a map of the record (CMap), your compressed byte array (Indata), and your record (OutData). I had been considering swapping positions of InData and OutData so that the calling conventions are the same for COMPRESS and UNCOMPRESS but didn't. If you want to, go ahead, you've got the source code. CMap, the record map is the most complicated part of this mess. To compress and uncompress you record, I need to know what it looks like. To do this is fairly simple. I use the word 'fairly' advisedly. Referring to the Address Book record, the CMap would be 'S40S40S40S25S2S9'. You should get an idea from that. Basically, you tell me, in short hand, what the fields in the record are. To wit: I = INTEGER; 2bytes (Case is irrelevant) L = LONGINT; 4 R = REAL; 6 B = BYTE; 1 S = STRING; C = CHAR; 1 P = POINTER; 4 W = WORD; 2 Types not supported are: enumerated type, single, double, or comp floating-point types, and set types. 'S' may have a length behind it to define the declared length of the string: ie. STRING[40] is 'S40'. If there is no length following the string identifier 'S', I assume the length is 255 bytes, the length of a string defined STRING. A number may be used to define a length of data. If you have 5 byte fields in a row, you can either have them defined as 'BBBBB' or '5'. Likewise, if a record contains 2 Integers and a pointer you may define them as 'IIP' or '8'. If you have a STRING[40] followed by 5 byte fields, you must separate with a comma (','), i.e. 's40,5'. Lets face it 'S405' makes no sense. Also, an 'S' followed by a number that is not the strings length must be seporated by a comma. IE. if you have a field defined STRING followed by 5 BYTE fields the 'S5' would be assumed to be a 5 byte string, 'S,5' is a 255 byte string followed by 5 bytes of whatever. So, you say you've got arrays. I can handle that. Lets say you'd defined a record thus: Rec = Record StrArray : array [1..25] string[40]; No problem. Arrays can be defined by brackets ('[',']'). A left bracket '[' followed by the number of items in the array starts an array definition and an right bracket ']' ends it. To Wit: '[25s40]' defines an array of 25 40 byte strings (Array [1..25] STRING[40]). Arrays can also be nested up to 100 levels deep. Actually, I've allowed for 100 levels in my tables but realistically you may have only 100 symbols of any kind in the CMap string. If you find a need to expand the limits, go ahead. The type definitions L1 and L2 are where to change them. These are the Cmap parse tables. There are two fields for flagging errors: 1) COUNERR an integer where: 1 is a memory allocation error. 2 is a invalid pnumonic error. (I don't recognize a record map token character) COUNWHR tells you the character position. 3 is a bracket mismatch error. 4 Cmap is too big. 2) COUNWHR an integer field defines the CMap string that caused the trouble. There are a several limits as to what I allow. Records can only be 32000 bytes long. Also, like I said above, the maximum CMAP length is 100 characters. Multi-dimentional arrays are not supported. Oh, you can do it by defining a nested array, but I wouldn't try to define a multi-dimentional array which contains strings unless you really under- stand how Turbo Pascal allocates memory. A note about the previous paragraph: There are no good reasons for most of the limitations. I just didn't need anything bigger. If, however, you do deceide to make the Byte Array bigger, there are Turbo Pascal limitations. Integers go to +32K so indexes need to be changed to LongInt or Word. I'm not sure how big arrays can be, but there is a limit, look it up. Also, the obvious limit to the CMAP is 255 characters. If you come up with any interesting ways around that, let me know. When compiled the COUN.PAS unit uses 2264 bytes of code and 53 bytes of data space. If there is enough interest in this (or if Dean Farwell askes me to) I will convert this to TASM assembler. It should then be faster and smaller. If someone else wants to do it, that's fine also. Please send me the code when your done. The source code is provided for several reasons. 1) I like to see what other people are doing, I assume others do too. 2) If someone comes up with nifty a way of making these routines faster, smaller, more elegant, whatever, I would like to know. If you use these routines, I don't want money. Well, yes I do. If you feel like sending me a fiver, go ahead. What I really want is to know if anyone finds them useful. Drop me a note. I enjoy chatting with others in the field. If you, God forbid, find any bugs in these routines, please let me know. I will fix them and get a new version out to you ASAP. I'm very proud of my work, so I really do try hard to provide the best time will allow. Also, try fixing them yourself, it's good practice. I have a 20 month old child in the house so no late night calls. Anything after 10pm CST and I'll probably get quite angry. You're much more likely to get me via CompuServe then calling by phone. But if you must, evenings and weekends are the best time. I do not, under any circumstances, accept collect calls. Deal with it. Biography: (I saw it in someone elses doc and thought it was a good idea) Carl Franz has been in programming for 13 years. He has written code professionally for Univac, Burroughs, DEC, IBM Mainframe, Z80 CP/M, and IBM PC. Currently I'm a Technical Advisor for a commercial bank. I consult on the side when the mood hits me. The JFL in JFL Consulting stands for 'Just For Laughs' (not really, but you get the point). Need a utility written, give me a buzz, if it sounds like fun we can work something out. Yet again I'm going to plug TBTree. The next version will provide Network support. It already provides fixed and variable length record support, record lists, keys of Turbo Pascal variable types, so-so documentation but good example programs. Last I looked, it was in BPROGA Lib 2. It's a big download (about 300K) but worth it. All source code provided. And, for goodness sake, pay the man his $25, it isn't alot for what you are getting and he needs to know if anyone is really using the product. On top of which, as far as I can tell it's bugless. Good luck and may the farce be with you. For Algorythm Freaks The algorythm for the process is kind of brainless. (Brainless means 'Why didn't I think of that earlier'). Basicly, there are 2 tables: 1) L1 absorbes all necessary information about the Tokens in the CMAP table, 2) L2 allow me to stack Array-Start information to handle nested arrays. At its vary basics there are 6 token types: 1) 'S' or string with an optional length; 2) scaler lengths (numeric values); 3) any of the rest of the pnumonics which refer to Pascal Types; 4) The start array left bracket '[' plus iteration value; 5) the end array right bracket ']'; and 6) the lowly comma. ParseCMap calls GetToken at the start of each loop. GetToken looks at the next value in CMap and loads LP1T with the token type, whatever the character is, and a length. The length all Pascal Types is gotten via a SizeOf, except String (S) for which GetNum is called to check if there is a numeric character after the 'S'. If there is a numeric character, it absorbs characters from CMap until a non-numeric value is found converting the mess into an integer. LP1T is later copied to the next item in the L1 table. The '[' or Start-Array does something a little different. For the most part it works the same as the String 'S' token. Except, when one is found an entry is made onto stack L2. The entry consistes of the index value of where the '[' entry is in L1. As '['s are found, each is pushed onto the stack. When an Array-End ']' token is found, an entry in the L2 stack is poped. This entry contains the index location of the matching Start-Array. The Size component of L1 is then loaded with the location of the matching Start-Array so that when they are finally processed you will know which entry of the L1 table to return to for iteration. I have to apologize about the naming conventions. I was rereading some notes on expression parsing and evaluation from college which used the same stupidly cryptic conventions. I wasn't feeling particularly creative at 2:30am so I used them instead of making up better ones. On the Compress/DeCompress side you step thru the L1 array and do what it says. Except. When a Start-Array is found the iteration count is moved from Size to Decr. Then, upon seeing an End-Array the Decr of the matching Start-Array is checked: If 0 then nothing is done and processing continues to the next item; else if Decr is not zero it is decremented and the index address of the matching Start-Array is loaded to the L1 index. Remember that the L1 index will be incrimented before checking the next L1 entry so the Start-Array will not actually be processed again during the 'array loop'. Also, each Start-Array has its own Decr, thus nested array will process properly. There is a slightly more effecient way of handling the Array loops, however it involves another integer in L1 and some somewhat more complicated code. Also, I have a blind spot figuring out where I should be with indexes. I'm alway one ahead or behind where I should be.